AITopics | step-by-step instruction

Collaborating Authors

step-by-step instruction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bf05b8d4361c6be8e250be4b924f0e1d-Paper-Conference.pdf

Neural Information Processing SystemsJun-22-2026, 13:47:00 GMT

Finetuning large language models (LLMs) enables user-specific customization but introduces important safety risks: even a few harmful examples can compromise safety alignment. A common mitigation strategy is to update the model more strongly on examples deemed safe, while downweighting or excluding those flagged as unsafe. However, because safety context can shift within a single example, updating the model equally on both harmful and harmless parts of a response is suboptimal -- an atomic treatment we term static safety shaping. In contrast, we propose dynamic safety shaping (DSS), a dynamic shaping framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content. To enable such fine-grained control during finetuning, we introduce a key insight: guardrail models, traditionally used for filtering, can be repurposed to evaluate partial responses, tracking how safety risk evolves throughout the response, segment by segment. This leads to the Safety Trajectory Assessment of Response (STAR), a token-level signal that enables shaping to operate dynamically over the training sequence. Building on this, we present DSS, a DSS method guided by STAR scores that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families, all without compromising capability on intended tasks. We encourage future safety research to build on dynamic shaping principles for stronger mitigation against evolving finetuning risks.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

The AI Safety Demo That Caused Alarm in Washington

TIME - TechJan-6-2026, 15:03:37 GMT

Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? Late last year, an AI researcher opened his laptop and showed me something jaw-dropping. Lucas Hansen, co-founder of nonprofit CivAI, was showing me an app he built that coaxed popular AI models into giving what appeared to be detailed step-by-step instructions for creating poliovirus and anthrax. Any safeguards that these models had were stripped away.

ai safety demo, caused alarm, turley, (10 more...)

TIME - Tech

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)
Europe > Germany (0.05)
(2 more...)

Industry:

Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.58)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

Yang, Yan, Xiao, Zeguan, Lu, Xin, Wang, Hongru, Huang, Hailiang, Chen, Guanhua, Chen, Yun

arXiv.org Artificial IntelligenceJul-1-2024

The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. Inspired by the social facilitation concept, SoP generates and optimizes multiple jailbreak characters to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SoP can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SoP achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SoP. Code is available at https://github.com/Yang-Yan-Yang-Yan/SoP.

judgement model, llm, target llm, (15 more...)

arXiv.org Artificial Intelligence

2407.01902

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents

Chen, Weizhe, Koenig, Sven, Dilkina, Bistra

arXiv.org Artificial IntelligenceJun-16-2024

In this past year, large language models (LLMs) have had remarkable success in domains outside the traditional natural language processing, and people are starting to explore the usage of LLMs in more general and close to application domains like code generation, travel planning, and robot controls. Connecting these LLMs with great capacity and external tools, people are building the so-called LLM agents, which are supposed to help people do all kinds of work in everyday life. In all these domains, the prompt to the LLMs has been shown to make a big difference in what the LLM would generate and thus affect the performance of the LLM agents. Therefore, automatic prompt engineering has become an important question for many researchers and users of LLMs. In this paper, we propose a novel method, \textsc{RePrompt}, which does "gradient descent" to optimize the step-by-step instructions in the prompt of the LLM agents based on the chat history obtained from interactions with LLM agents. By optimizing the prompt, the LLM will learn how to plan in specific domains. We have used experiments in PDDL generation and travel planning to show that our method could generally improve the performance for different reasoning tasks when using the updated prompt as the initial prompt.

information, llm, llm agent, (13 more...)

arXiv.org Artificial Intelligence

2406.11132

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

Vega, Jason, Chaudhary, Isha, Xu, Changming, Singh, Gagandeep

arXiv.org Artificial IntelligenceDec-19-2023

Content warning: This paper contains examples of harmful language. With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA opensource LLMs under simple, optimization-free attacks we refer to as priming attacks, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to 3.3 compared to baselines. Autoregressive Large Language Models (LLMs) have emerged as powerful conversational agents widely used in user-facing applications. To ensure that LLMs cannot be used for nefarious purposes, they are extensively safety-trained for human alignment using techniques such as RLHF (Christiano et al., 2023). Despite such efforts, it is still possible to circumvent the alignment to obtain harmful outputs (Carlini et al., 2023). For instance, Zou et al. (2023) generated prompts to attack popular open-source aligned LLMs such as Llama-2 (Touvron et al., 2023a) and Vicuna (Chiang et al., 2023) to either output harmful target strings or comply with harmful behavior requests.

information, instruction, llm, (15 more...)

arXiv.org Artificial Intelligence

2312.12321

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Research Report (0.64)
Instructional Material (0.48)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A New Attack Impacts ChatGPT--and No One Knows How to Stop It

WIREDAug-1-2023, 11:00:00 GMT

ChatGPT and its artificially intelligent siblings have been tweaked over and over to prevent troublemakers from getting them to spit out undesirable messages such as hate speech, personal information, or step-by-step instructions for building an improvised bomb. But researchers at Carnegie Mellon University last week showed that adding a simple incantation to a prompt--a string text that might look like gobbledygook to you or me but which carries subtle significance to an AI model trained on huge quantities of web data--can defy all of these defenses in several popular chatbots at once. The work suggests that the propensity for the cleverest AI chatbots to go off the rails isn't just a quirk that can be papered over with a few simple rules. Instead, it represents a more fundamental weakness that will complicate efforts to deploy the most advanced AI. "There's no way that we know of to patch this," says Zico Kolter, an associate professor at CMU involved in the study that uncovered the vulnerability, which affects several advanced AI chatbots. "We just don't know how to make them secure," Kolter adds.

chatbot, new attack impact chatgpt, step-by-step instruction, (9 more...)

WIRED

Industry: Information Technology > Security & Privacy (0.75)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

Add feedback

Improving Cross-Task Generalization with Step-by-Step Instructions

Wu, Yang, Zhao, Yanyan, Li, Zhongyang, Qin, Bing, Xiong, Kai

arXiv.org Artificial IntelligenceMay-7-2023

Instruction tuning has been shown to be able to improve cross-task generalization of language models. However, it is still challenging for language models to complete the target tasks following the instructions, as the instructions are general and lack intermediate steps. To address this problem, we propose to incorporate the step-by-step instructions to help language models to decompose the tasks, which can provide the detailed and specific procedures for completing the target tasks. The step-by-step instructions are obtained automatically by prompting ChatGPT, which are further combined with the original instructions to tune language models. The extensive experiments on SUP-NATINST show that the high-quality step-by-step instructions can improve cross-task generalization across different model sizes. Moreover, the further analysis indicates the importance of the order of steps of the step-by-step instruction for the improvement. To facilitate future research, we release the step-by-step instructions and their human quality evaluation results.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.04429

Country:

North America > Dominican Republic (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre:

Workflow (1.00)
Instructional Material > Training Manual (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

10 awesomely practical tasks you can do with ChatGPT

PCWorldApr-10-2023, 10:45:00 GMT

ChatGPT, a powerful language model chatbot developed by OpenAI, has revolutionized the way we interact with artificial intelligence. With its advanced natural language processing capabilities, ChatGPT can help you with a wide range of tasks, from answering trivia questions to composing poetry. Here are 10 wonderfully fun, awesomely practical ways to put ChatGPT to work. One interesting thing ChatGPT can do is generate recipes based on user preferences, ingredients, or specific dietary requirements. Give it a starting point by providing certain information about the desired dish, such as the type of cuisine, the main ingredients you want to use, or any dietary restrictions you may have.

chatgpt, information, marshall gunnell idg chatgpt, (10 more...)

PCWorld

Country: Asia > Taiwan > Taiwan Province > Taipei (0.05)

Industry:

Health & Medicine > Consumer Health (1.00)
Leisure & Entertainment (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

Singh, Kunal Pratap, Bhambri, Suvaansh, Kim, Byeonghwi, Mottaghi, Roozbeh, Choi, Jonghyun

arXiv.org Artificial IntelligenceDec-6-2020

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. Recently, an `interactive instruction following' task has been proposed to foster research in reasoning over long instruction sequences that requires object interactions in a simulated environment. It involves solving open problems in vision, language and navigation literature at each step. To address this multifaceted problem, we propose a modular architecture that decouples the task into visual perception and action policy, and name it as MOCA, a Modular Object-Centric Approach. We evaluate our method on the ALFRED benchmark and empirically validate that it outperforms prior arts by significant margins in all metrics with good generalization performance (high success rate in unseen environments). Our code is available at https://github.com/gistvision/moca.

agent, instruction, moca, (15 more...)

arXiv.org Artificial Intelligence

2012.03208

Country: Asia > India > Uttarakhand > Roorkee (0.04)

Genre: Workflow (0.67)

Industry: Education > Educational Setting > Online (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Beginning Java Programming - Programmer Books

#artificialintelligenceDec-10-2019, 00:13:04 GMT

A comprehensive Java guide, with samples, exercises, case studies, and step-by-step instruction Beginning Java Programming: The Object Oriented Approach is a straightforward resource for getting started with one of the world's most enduringly popular programming languages. Based on classes taught by the authors, the book starts with the basics and gradually builds into more advanced concepts. The approach utilizes an integrated development environment that allows readers to immediately apply what they learn, and includes step-by-step instruction with plenty of sample programs. Each chapter contains exercises based on real-world business and educational scenarios, and the final chapter uses case studies to combine several concepts and put readers' new skills to the test. Beginning Java Programming: The Object Oriented Approach provides both the information and the tools beginners need to develop Java skills, from the general concepts of object-oriented programming.

java programming, programmer book, step-by-step instruction, (1 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

Add feedback